Storing XML Documents in Databases

نویسندگان

  • Albrecht Schmidt
  • Stefan Manegold
  • Martin L. Kersten
چکیده

INTRODUCTION Ever since the Extensible Markup Language (XML) (W3C, 1998b) began to be used to exchange data between diverse sources, interest has grown in deploying data management technology to store and query XML documents. A number of approaches propose to adapt relational database technology to store The advantage is that the XML repository inherits all the power of mature relational technology like indexes and transaction management. For XML-enabled querying, a declarative query language (Chamberlin et al., 2001) is available. Traditionally, database technology has been offering support for processing large amounts of data. Recent research has provided valuable insights into the nature of semistructured and XML data and has attempted to integrate them into existing paradigms. However, there are still challenges that have to be met to scale XML databases up to production levels as achieved by rela-tional engines and, thus, to gain acceptance among practitioners. Naturally, XML warehouses inherit the power of relational warehouses (Roussopoulos, 1997), but they also face the same challenges; in particular, update and consistency problems of materialized, replicated, and aggregated views over source data need to be solved. This article discusses techniques related to loading XML documents into a document warehouse. All techniques build on well-understood relational database technology and enable efficient management of large XML repositories. To get the most of relational database systems, we propose to do away with the pointer-chasing tree traversing operations, which many applications generate in the form of edit scripts and replace them with set-oriented operations. Edit scripts (Chawathe et al., 1996; Chawathe & Garcia-Molina, 1997) have been long known in text databases and are similar in behavior to Document Object Model (DOM) (W3C, 1998a) travers-als, which are standard in the XML world; they tend to put relational technology at a disadvantage due to their excessive use of pointer-chasing algorithms. We investigate the use of these scripts and propose alternative strategies for cases when they perform poorly. A more detailed description of our experiments is found in Schmidt and Kersten (2002). As we benchmarked the system's performance, it turns out that the use of edit-scripts is only sensible if they only update a rather small fraction of the database; once a certain threshold is exceeded, the replacement of a complete database segment is preferable. We discuss this threshold and try to quantify the trade-off for our example document database. The application scenario which motivates our research consists of a set …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Storing Multidimensional XML Documents in Relational Databases

The problem of storing and querying XML data using relational databases has been considered a lot and many techniques have been developed. MXML is an extension of XML suitable for representing data that assume different facets, having different value and structure under different contexts, which are determined by assigning values to a number of dimensions. In this paper, we explore techniques f...

متن کامل

XRecursive: An Efficient Method to Store and Query XML Documents

Storing XML documents in a relational database is a promising solution because relational databases are mature and scale very well and they have the advantages that in a relational database XML data and structured data can coexist making it possible to build application that involve both kinds of data with little extra effort . In this paper, we propose an algorithm schema named XRecursive that...

متن کامل

NATIVE XML DATABASES vs. RELATIONAL DATABASES IN DEALING WITH XML DOCUMENTS

When dealing with data-centric XML documents, it is possible to convert XML documents into a relational database, which can then be queried using SQL. Such relational databases are called XML-enabled databases. On the other hand, the best choice for storing, updating and retrieving document-centric XML documents is usually a native XML database (NXD). NXDs store XML documents as logical units, ...

متن کامل

A Middleware Approach to Storing and Querying XML Documents in Relational Databases

In this paper we present a middleware for storing and retrieving XML documents in relational databases. To store XML documents in RDBMS, several mapping approaches can be used. We chose structure independent approach. This approach stores XML documents in fixed-schema tables and does not require a direct extension of SQL. So the middleware can be used with any RDBMS with minor changes in the in...

متن کامل

Model-Mapping Approaches for Storing and Querying XML Documents in Relational Database: A Survey

Extensible Markup Language (XML), which is recommended by the World Wide Web Consortium (W3C), has rapidly become the dominant standard for data interchange and data representation on the web. At present, with the growing use of XML data on the web, the size of this type of data is increasing rapidly, and users issue more complex queries on this data. Therefore, the demand to manage this data i...

متن کامل

Storing-updating and Querying Multidimensional Xml Documents Using Relational Databases

In Web applications it is often required to manipulate information of semistructured nature, which may present variations according to different circumstances. Multidimensional XML (MXML) is an extension of XML suitable for representing data that assume different facets, having different value and structure, under different contexts. Following previous work on storing XML in relational database...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005